Add Phase 3 PR B: covariates, trends, and extensions for dCDH#302
Add Phase 3 PR B: covariates, trends, and extensions for dCDH#302
Conversation
Implements ROADMAP items 3a-3e and 3i:
- DID^X covariate residualization (Web Appendix Section 1.2)
- DID^{fd} group-specific linear trends (Section 1.3, Lemma 6)
- State-set-specific trends (Section 1.4)
- Heterogeneity testing (Section 1.5, Lemma 7)
- Design-2 switch-in/switch-out convenience wrapper (Section 1.6)
- R parity tests for controls and trends_lin scenarios
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P1 fixes: - DID^X residualization no longer leaks into per-period path: per_period_effects uses raw Y_mat, only multi-horizon path sees residualized outcomes - Added to_dataframe levels for heterogeneity and linear_trends P2 fixes: - Covariate coercion no longer mutates caller's DataFrame - Vectorized residualization (einsum replaces nested loop) - Heterogeneity test guards against rank-deficient OLS - Added estimand contract test for controls + L_max=1 - REGISTRY note clarifies per_period_effects stays unadjusted Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality
Performance No findings. Maintainability No findings. Tech Debt No findings beyond the test/doc gap below; none of the substantive issues above are tracked in TODO.md:51. Security No findings. Documentation/Tests
Path to Approval
|
…nt_action
P0: Cumulated DID^{fd} SE now requires ALL component SEs to be finite;
non-finite SE at any horizon propagates NaN (was silently dropped).
P1: trends_linear + L_max>=2 overall surface now reports cumulated level
effects from linear_trends_effects (was second-difference delta).
cost_benefit_delta suppressed under trends_linear (meaningless on
second-differences).
P2: rank_deficient_action threaded through _compute_covariate_residualization
and _compute_heterogeneity_test (was hardcoded "warn").
P3: fit() docstrings updated for controls, trends_linear, trends_nonparam
(were stale "Reserved for Phase 3" text).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…geneity
P1: trends_linear + L_max>=2 overall_* is now NaN (R does not compute
an aggregate in trends_lin mode). Cumulated effects available via
results.linear_trends_effects[l].
P1: heterogeneity + controls now raises ValueError (matching R's
predict_het which disallows controls). REGISTRY documents
heterogeneity as partial implementation (post-treatment only,
no placebo regressions or joint null test).
P3: Added fit() docstrings for heterogeneity and design2 parameters.
P3: Updated to_dataframe() error text with new level names.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology Affected methods in this PR are DID^X covariate adjustment (
Code Quality No separate findings beyond the missing parameter-validation/propagation issues above. Performance No findings. Maintainability No separate findings beyond the API-contract drift noted below. Tech Debt No mitigating Security No findings. Documentation/Tests
Path to Approval
|
P1: heterogeneity without L_max now raises ValueError (was silent no-op).
P1: heterogeneity rejects trends_linear and trends_nonparam (would produce
coefficients for wrong estimand since het test uses raw level changes).
P1: design2=True with drop_larger_lower=True now raises ValueError (the
2-switch groups Design-2 needs are dropped by the default filter).
P3: NaN overall row under trends_linear+L_max>=2 now labeled as
"DID^{fd}_l (see linear_trends_effects)" instead of "Cost-Benefit Delta".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
P0: DID^X first-stage OLS now includes intercept when dropping one
time dummy as reference (was forcing omitted period FE to zero).
theta_hat extraction updated to skip intercept at index 0.
P0: Heterogeneity regression now includes intercept when dropping one
cohort dummy as reference (was forcing omitted cohort mean to zero).
beta_het extraction updated to read index 1 instead of 0.
P1: DID^X now guards against NaN control coefficients from rank-deficient
OLS. If any theta_hat entry is NaN, residualization is skipped for
that baseline with a UserWarning.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality No separate findings beyond the methodology issues above. Performance No findings. Maintainability No separate findings beyond the methodology/contract issues above. Tech Debt No matching Security No findings. Documentation/Tests
Path to Approval
|
P1: DID^X rank-deficiency now residualizes with finite subset of
theta_hat (zeroing NaN coefficients) instead of skipping entirely.
P1: trends_nonparam now rejects set definitions that are not coarser
than group (singleton sets have no within-set controls).
P1: heterogeneity restrictions on trends_linear and trends_nonparam
now documented in REGISTRY.md and fit() docstring.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality No separate findings beyond the methodology issues above. Performance No findings. Maintainability No separate findings beyond the propagation issue above. Tech Debt No matching Security No findings. Documentation/Tests
Path to Approval
|
P0: trends_nonparam set_ids now threaded into both placebo functions
(_compute_multi_horizon_placebos and _compute_per_group_if_placebo_horizon)
so placebo diagnostics use same-set controls when state-set trends active.
P1: DID^X first-stage now guards n_obs < n_params before calling solve_ols.
Skips residualization for that baseline with warning instead of crashing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality No separate findings beyond the methodology issues above. Performance No findings. Maintainability No separate findings beyond the public-surface labeling issue below. Tech Debt No matching entry in TODO.md:51 mitigates the P1s above; the existing dCDH TODOs only cover Phase 1 per-period placebo SE and mixed-direction parity. Security No findings. Documentation/Tests
Path to Approval
|
P1: Failed DID^X first-stage strata (n_obs=0 or n_obs < n_params) now
have outcomes set to NaN so they're excluded from downstream DID
computation. Previously left unadjusted, mixing raw + adjusted.
P1: Cell-weight deviation documented in REGISTRY (equal cell weights
vs R's N_gt observation-count weights - same Phase 1 convention).
P3: Event study rows in to_dataframe("event_study") now labeled as
DID^X_h, DID^{fd}_h, DID^{X,fd}_h when adjustments active.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Needs changes Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…bels
P1: Failed DID^X first-stage strata now have N_mat zeroed out (not just
Y_mat NaN'd). The downstream eligibility checks (N_mat[g,idx] > 0)
in all DID/IF/placebo functions naturally exclude these groups from
N_l/N_pl_l, preventing NaN poisoning of otherwise estimable horizons.
P3: _estimand_label() now returns DID^{X,fd}_l (not DID^{fd}_l) when
both controls and trends_linear active with L_max>=2.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality
Performance Maintainability
Tech Debt Security Documentation/Tests
Path to Approval
|
…guard
P0: normalized_effects suppressed under trends_linear (was normalizing
second-differences DID^{fd}_l instead of level effects). REGISTRY
documents that normalized_effects and cost_benefit_delta are both
unavailable under trends_linear.
P1: Non-finite (Inf) control values now rejected with ValueError during
DID^X validation (was silently collapsing first-stage OLS).
P3: summary() event study header and row labels now use _horizon_label()
(DID^X_l, DID^{fd}_l, DID^{X,fd}_l matching to_dataframe).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance No findings. Maintainability
Tech Debt No findings. Security No findings. Documentation/Tests
I did not rerun the test suite in this environment. Path to Approval
|
P1: trends_nonparam now rejects NaN/missing set assignments with ValueError.
P1: design2_effects always uses raw level outcomes from y_pivot (not
residualized or first-differenced Y_mat).
P2: controls=[] now raises ValueError instead of crashing on np.stack([]).
P3: summary() overall block labeled "N/A under trends_linear" when
trends + L_max>=2 (was "Cost-Benefit Delta" with NaN value).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall AssessmentExecutive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
P1: REGISTRY now explicitly documents that design2_effects always uses
raw (unadjusted) outcomes regardless of active controls/trends/nonparam.
For full adjusted Design-2 estimation, the paper recommends running
the standard estimator on a restricted subsample with trends_nonparam.
P3: Added test_nan_set_membership_rejected for trends_nonparam NaN guard.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
P1: REGISTRY now documents that trends_nonparam does not enforce
Assumption 14 (common last-untreated period across sets) up front.
When within-set controls are exhausted at a given horizon, affected
switcher/horizon pairs are excluded via the existing empty-control-pool
mechanism. The effective estimand is trimmed to within-set support.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall AssessmentExecutive Summary
MethodologyAffected methods in this PR are the paper’s covariate-adjusted DID (
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Static review only: I could not run the suite because Path to Approval
|
P1: When trends_linear=True without controls, _compute_per_period_dids
and _compute_cohort_recentered_inputs now use raw y_pivot outcomes
(not the first-differenced Z_mat). Previously the per-period path
would double-difference the already-differenced outcomes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology No unmitigated findings. The affected methods are DID^X covariate adjustment, The paper/manual also support the PR’s treatment of trend-adjusted effects as first-difference estimands cumulated back to levels and note the equal-support requirement for set-specific trends. The registry now documents the finite-sample SE normalization gap, the trimmed-support behavior under Code Quality
Performance No findings. Maintainability
Tech Debt No findings. Security No findings. Documentation/Tests
Static review only: |
- Fix _compute_covariate_residualization return type annotation to
include failed_baselines set (was 2-tuple, now 3-tuple)
- Add to_dataframe("design2") level for Design-2 results
- Update to_dataframe() docstring with all new levels
- Update results dataclass docstring: replace Phase 3 placeholder
text with actual field descriptions for covariate_residuals,
linear_trends_effects, heterogeneity_effects, design2_effects
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Methodology references (required if estimator / math changes)
Validation
Security / privacy
Generated with Claude Code